Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-43014: Reduce alert stream credentials timeout #195

Merged
merged 1 commit into from
Mar 22, 2024

Conversation

bsmartradio
Copy link
Contributor

@bsmartradio bsmartradio commented Mar 9, 2024

The new function _check_server pings the kafka server and checks if there are any available topics. If the server is not contactable or if the server topics are empty, it raises and error and stops the pipeline.

# should never be committed to source code.
"sasl.username": self.username,
"sasl.password": self.password,
# Batch size limits the largest size of a kafka alert that can be sent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove leftover comment? There's no config here for size.

@@ -293,6 +313,7 @@ def produceAlerts(self, alerts, ccdVisitId):
ccdVisitId of the alerts sent to the alert stream. Used to write
out alerts which fail to be sent to the alert stream.
"""
self._server_check()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we can eventually drop these and just use the server check on startup, as this adds latency. But let's leave it for now while we get things running.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can afford 2×0.5 seconds, and can always tighten the timeout a little. I'm a bit disappointed that this won't help with interrupted connections, but I'd bet it's still a net gain on expected running time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still working on a solution for interrupted connections, but figured it was important to get this in place for the test/code freeze.


except KafkaException as e:
self.log.error(e)
raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the log-and-reraise here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing some functionality of raise, does raise log the error automatically?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All our execution frameworks should log any exceptions that get emitted from the pipeline (certainly Prompt Processing does!). It's generally considered bad practice to log an exception you're not handling, not least because of what happens if everybody does it. 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've swapped it to

except KafkaException as exception:
          raise exception

However, if I didn't include the as expection part and just did raise by itself, would that catch the full error correctly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would, but I don't think you need that at all... you could remove that whole try-except block and the behavior would be the same. (It might be helpful to document that KafkaException is expected; that usually goes in the docstring.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that is very true. I'll add a docstring (something I should have done anyways) and remove the try/except.

Add check_server function to check whether or not the server is contactable. Update unit tests
@bsmartradio bsmartradio merged commit 7bcfb5c into main Mar 22, 2024
2 checks passed
@bsmartradio bsmartradio deleted the tickets/DM-43014 branch March 22, 2024 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants